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(54) Queuing system and method of operation. 

(57) A queuing system and method of operation 
are provided that reduces latency and increases 
efficiency in a general purpose queuing system. 
The technique of the present invention is ap- 
plied in an intermediate node that receives an 
entity, such as information, from a first node, 
and transfers that entity to a second node. The 
technique comprises the steps of (a) receiving 
at the intermediate node (B) a first block of the 
entity sent by the first node (A) ; (b) upon receipt 
of the block, initiating the sending of a subse- 
quent block of the entity to the intermediate 
node ; (c) concurrently with step (b), transfer- 
ring the first block of the entity to the second 
node (C) ; (d) upon receipt of an acknowledge- 
ment from the second node (C), causing the 
intermediate node to transfer a portion of the 
entity to the second node (C), the portion trans- 
ferred being all of the entity that has at the time 
of the transferral been received by the inter- 
mediate node (B) from the first node (A) since 
the previous transfer was made ; and (e) repeat- 
ing steps (b) and (d) until all of the entity has 
been transferred. 

The above technique is adaptive to many 
environments and will optimize throughput for 
systems that need to transfer entities such as 
information. This system and method can han- 
dle mismatched flow problems from diverse 
environments and provides optimal flow for 
solutions that require guaranteed transfers. 
This algorithm can change and adapt to varying 
circumstances. It can be altered in real-time for 
communication systems. If the block size 
changes the modification does not alter the 
smooth flow of the algorithm. 
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This invention relates to queuing systems and more particularly to transporting of entities, such as items 
or information, from one location to another using an intermediate queue. 

As computer manufacturers develop faster and more efficient computer communication networks, in- 
creasingly there are cases of mismatched bus and communication media speeds. One example is the IBM 

5 MicroChannel and an IBM Fiber Channel (FCS) MicroChannel adapter. The IBM MicroChannel is capable of sus- 
taining approximately 50 megabytes/sec, whereas the IBM FCS adapter can support either 25 megabytes/sec 
or 100 megabytes/sec. This mismatch also occurs with the MicroChannel and an IBM Token Ring adapter. Be- 
cause of these mismatches, data transfers can often be very inefficient with respect to the given communi- 
cation media speed. They can also be very efficient, but have a long delay in starting transmission. This delay 

10 is often referred to as latency. These two problems, latency and efficiency, are classic in the field of commu- 
nications. 

There are numerous applications which require optimization of either latency or throughput. There are also 
those that require optimization of both. Customers are increasingly interested in low latency and very efficient 
use of the communication media. The present state of the artfails to provide an adaptive yet simple throughput 
15 mechanism between systems when trying to minimize latency and maximize efficiency. 

It is therefore an object of the present invention to provide an improved queuing technique for an entity 
being transferred from a first node to a second node via an intermediate node. 

Accordingly the present invention provides a method of operating an intermediate node to receive an entity 
from a first node and to transfer the entity to a second node, the first and second nodes being connected to 
20 the intermediate node by transmission links, the method comprising the steps of: 

(a) receiving at the intermediate node a first block of the entity sent by the first node; 

(b) upon receipt of the block, initiating the sending of a subsequent block of the entity to the intermediate 
node; 

(c) concurrently with step (b), transferring the first block of the entity to the second node; 

25 (d) upon receipt of an acknowledgement from the second node, causing the intermediate node to transfer 
a portion of the entity to the second node, the portion transferred being all of the entity that has at the 
time of the transferral been received by the intermediate node from the first node since the previous trans- 
fer was made; and 

(e) repeating steps (b) and (d) until all of the entity has been transferred. 

30 Viewed from a second aspect the present invention provides a queuing system in an intermediate node 
for receiving an entity from a first node and transferring the entity to a second node, the first and second nodes 
being connected to the intermediate node by transmission links, the system comprising: reception means in 
the intermediate node for receiving a first block of the entity sent by the first node; initiation means, responsive 
to the reception means indicating receipt of the block, for initiating the sending of a subsequent block of the 

35 entity to the intermediate node; a transfer means, operating concurrently with the initiation means, to transfer 
the first block of the entity to the second node; the transfer means further, upon receipt by the intermediate 
node of an acknowledgement from the second node, transferring a portion of the entity to the second node, 
the portion transferred being all of the entity that has at the time of the transferral been received by the in- 
termediate node from the first node since the previous transfer was made; the initiation means and transfer 

40 means repeating their functions until all of the entity has been transferred. 

The present invention reduces latency and increases efficiency in a general purpose queuing system. The 
present invention is adaptive to many environments and will optimize throughput for systems that need to 
transfer information or other types of entities from point A to point C through intermediate point B. Example 
environments for utilizing the invention described herein include transfer of data via a communication channel, 

45 movement of people/equipment/goods via a transportation system, mail delivery scheduling, telephonic 
switching, etc. 

An intermediate node of a multi-node system controls information flowing through it by queuing received 
information and transferring the received information to a subsequent node independent of the block size of 
the information being transferred. Subsequent blocks of Information are transferred upon completion of a pre- 
50 vious transferred block, rather than upon completion of an incoming block being received. 

This procedure can handle mismatched flow problems from diverse environments and provides optimal 
flow for solutions that require guaranteed transfers. Better performing algorithms exist, but they cannot guar- 
antee that the element being transferred will get from system A to C. 

This procedure can change and adapt to varying circumstances. It can be altered in real-time for commu- 
55 nication systems. If the block size changes the modification does not alter the smooth flow of the algorithmic 
procedure. The block size could be changed by a customer desiring to have real-time control over latency and 
throughput. In the case of IBM's FCS adapter, it may be desirable to expedite certain services and not others. 
It provides fine-tuned control over the data flowing through the system. When the setup time is very small, 
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one could use a standard communication meter and small block size to get good results. However, if the setup 
time were sizable, the incurred overhead with a small block size would be very high. The invention disclosed 
herein is better in both cases, especially the latter. 

It can be seen that the present invention provides an adaptive flow control system. In preferred embodi- 
5 ments the technique provides an efficient yet adaptive communication system, being able to match dissimilar 
path speeds used for transporting information. 

The present invention will be described further, by way of example only, with reference to an embodiment 
thereof as illustrated in the accompanying drawings, in which: 

Figure 1 is a block diagram of a system in accordance with the preferred embodiment of the invention, 
10 including a sending, intermediate, and receiving node; 

Figure 2 is a flow diagram of a simple algorithm used to transfer information between nodes; 

Figure 3 is a flow diagram of a standard algorithm used to transfer information between nodes; 

Figure 4 is a flow diagram of an adaptive flow algorithm used to transfer information between nodes in 

accordance with the preferred embodiment of the invention; 
15 Figure 5 is a block diagram of a mult i- node environment, such as used in a switched telecommunication 

system; 

Figure 6 is a typical data processing system, which can provide the functionality of a sending and inter- 
mediate node; and 

Figure 7 is a block diagram of a communications adapter. 
20 Referring initially to Figure 1 , there are several parameters that should be defined before describing the 
preferred system and method. 

- Systems A (10) and B (20) communicate over link AB (12) with link speed M. 

- Block moves between A and B are of size < = x. 

- Systems B (20) and C (30) communicate over link BC (14) with link speed N. 

25 - Block moves between B and C are of any size. Blocks can be any quantity of items/people/information 
being conveyed or transferred between points. 

- There exists a setup time for transfers between B and C of Ts. 

- M and N are not necessarily equal. 

- Y is the size of date transferred. 

30 - Ttotal is the total time required in the transfer. 

Simple Algorithm 

Referring to Figure 2, the simplest technique for transferring data from A (10) to C (30) is to: 
35 - Transfer x from A to B (at 22) 

- When x arrives at B (24), transfer x to C (at 26) and send acknowledgement to A (at 28) 

- If done (32), exit (34); else go to the beginning (22) 
The equation for Ttotal = Y/M + Y/N + Y*Ts/x 

40 Standard Algorithm 

Referring to Figure 3, a technique at the next level of complexity would be: 

- Transfer x from A to B (at 36) 

- Dual transfer 

45 - When x arrives at B (38), send acknowledgement to A (40); when link BC clear (42), transfer x to C 

(44) 

- and, when acknowledgement received from B (46), transfer another x from A to B (36) 

- If done(48), exit; else go to the dual transfer (36) 
The equation for Ttotal = x/M + Y/N + Y*Ts/x 

50 

Adaptive Flow Algorithm 

The adaptive algorithm employed in the preferred embodiment of the present invention uses the ratio of 
M to N, and a value p, where p = ceil(log(Y/x)/log(M/N))-1 and ceilO is the ceiling function, sigma-i(n) is the 
55 sum from j=0 to pi of n raised to the jth power. 

p + 2 is the total number of transfers for the adaptive algorithm. Referring to Figure 4, the adaptive algo- 
rithm flows as follows: 

- Transfer x from A to B (52) 
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- First dual transfer 

- When x arrives at B (56); transfer x to C (58) 

- When x at B (54), send acknowledgement to A (60) to initiate (62) another transfer of x from A to B 
(52) 

5 - Second through (p + 2)th dual transfer 

- Upon receipt of an acknowledgment from C (60), whatever is at B (designated by x* as determined 
at 56; where x' is larger or smaller than x, due to differing link speeds M and N), transfer that to C 
(58). 

- When x at B (54), send acknowledgement to A (60) to initiate (62) another transfer of x from A to B 
10 (52) 

- rf done, exit; else go do the Ith transfer 

- Node C, upon receipt of block x* (64), sends an acknowledgment to B (66). The determination as to 
whether the block x f has been received is made using any conventional technique known in the com- 
munication art for conveying a lengt h of data being sent within the data packet, such as in a packet head- 

f5 erfile. 

The equation for Ttotal using the adaptive algorithm is: 

Ttotal = x/M + Y/N + ceil(log(Y/x)/log(M/N) + 1)*Ts 

20 Formula De rivation 

Ttotal = x/M + Ts + x/N 1st transfer 

+ (M/N) * x/N + Ts 2nd transfer 
+ . . . 

25 + (M/N) 1 * x/N + Ts ith transfer 
+ . . . 

+ (M/N)P * x/N + Ts p+lst transfer 
+ <Y - * * sigma-p(M/N))/N + Ts p+2nd transfer 

30 



Solving for Ttotal; 

35 When M/N != 1: 

Ttotal = x/M + (p+2)*Ts + Y/N and 
p = ceil(log(Y/x)/log(M/N))-1 
When M/N = 1: 

The adaptive flow algorithm reduces to the standard algorithm. 

40 Tables 1-4 demonstrate transfer times for various communication channel scenarios using the above de- 
scribed algorithms. Table 1 shows Ttotal for a 1 Megabyte file transferred using 1K blocks, where the channel 
speed between A and B is 50 Megabytes/second and the channel speed between B and C is 25 Megabytes/sec- 
ond. This table also shows two set-up time (Ts) examples (10 and 100 microseconds). Not only is the total 
transfer time less using the adaptive algorithm, but overhead is minimized. The overhead % A (1 - (Tto- 

45 tal / min(M,N) / Y)) * 100, where Y is the file size. The overhead ratio A (overhead %)/(adaptive overhead %). 
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Table 4. Conparison of Ah 
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Sinple 


10 
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Standard 
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8.5 
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Tables 2-4 similarly show various results when using the above described algorithms, for various file and 
55 block sizes. 

The adaptive algorithm can be implemented using standard programming techniques as follows. One need 
only count the amount of data that has come from A to B (keep total at system/node B) while the transfer from 
B to C is occurring. Once the B to C transfer is complete, send the total accounted for data at B (the portion 
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received and counted) on to C. Thus, only node B is concerned with the possibly dissimilar data rates of link 
AB and link BC. Further, the block size can be dynamically changed at A without disrupting the adaptive al- 
gorithm, as the actual block size being used in the transfer of information is not used by B when determining 
whether to send information to C. This greatly simplifies system design by consolidating the transfer decision 

5 at a single node independent of the actual block size being used. The block could be changed to allow greater 
control over the latency and throughput of a particular flow of information, or to expedite a particular item 
through the system. The block size would be changed at the sending node, either manually by a user or au- 
tomatically by the sending node's controller or computer. As the other system node(s) queue and transfer in- 
formation irrespective of the block size, this size can be dynamically changed by the sender. 

w As shown in Figure 5, the technique of the preferred embodiment of this invention could similarly be ex- 
tended to a system having multiple intermediate nodes 80, such as in a switched point-to-point communication 
system, with the adaptive algorithm running in each intermediate node (a node other than the originating 78 
or final 82 node). Thus, each intermediate node handles the data flow mismatch for its respective sending 
and receiving nodes. 

15 Figure 6 shows the preferred embodiment data processing system 84, which comprises a CPU 90, read 
only memory 96, random access memory 94, I/O adapter 98, user interface adapter 1 02, communication adap- 
ter 114, and display adapter 116 all interconnected via a common data path, or bus, 92. Each of the above 
components accesses the common bus using conventional techniques known to those of ordinary skill in the 
art, and include such methods as dedicating particular address ranges to each component in the system, with 

20 the CPU being the bus master. Other conventional techniques known to those of ordinary skill in the art include 
direct memory access, or DMA, used to transfer data at high speed from external devices such as DASD 100 
or network 110 to the data processing system's random access memory (RAM) at 94. As is further shown in 
Figure 6, these external devices 100 and 110 interface to the common bus 92 through respective adapters 
98 and 114. Other external devices such as the display 118 similarly use an adapter 116 to provide data flow 

25 between the bus 92 and the display 118. User interface means are provided by adapter 102, which has at- 
tached thereto such items as a joystick 112, mouse 106, keyboard 104, and speaker 108. Each of these units 
is well known as such and so will not be described in detail herein. 

Figure 6 corresponds to the logical functions of Figure 1 in the following manner. Link 12 between system 
A 10 and system B 20 corresponds to bus 92 of Figure 6. System A of Figure 1 is the sender of data, and could 

30 be any of CPU 90, RAM 94, or I/O adapter 98 of Figure 6. In the preferred embodiment, data is provided to 
the communications adapter 114 from RAM 94 using conventional DMA techniques across bus 92. Link 14 of 
Figure 1 corresponds to network 110 of Figure 6. System C 30 of Figure 1 corresponds to a similar commu- 
nications adapter 114 in a similar data processing system 84 also residing on network 110. Other embodiments 
of this invention could similarly use entire data processing systems 84 at each of System A B, and C of Figure 

35 1 , and interconnected using traditional communication techniques. 

Figure 7 shows in greater detail the communication adapter 114, which enables the essential features of 
System B (Figure 1) in the preferred embodiment The adapter 114 is comprised of a microcontroller 122 cou- 
pled to a buffer 124, a transceiver 120 and a transceiver 126. Microcontrollers are commonly known in the 
art, and comprise a CPU 121, read only memory 123 and random access memory 125. Transceivers are used 

40 to interface to bus or network protocols by inserting/extracting the actual data to be transferred, as well as 
handling status signalling, within the particular bus or network protocol, as is commonly known in the art. The 
transceiver 120 receives data at 12 from the bus 92 of Figure 6. The transceiver 126 is an optical transceiver, 
and link 14 is an optical fiber, although it is apparent that the system of the invention could employ any type 
of transport mechanism. When data arrives at transceiver 120, it is buffered at 124, and the CPU is notified 

45 at 128. The CPU 122 maintains a count of the number of bytes received across link 12. The CPU 122, upon 
receipt of an acknowledgment at 130 which arrived across link 14 from System C (Figure 1), can initiate at 
132 a transmittal of buffered information 124 across link 14 using transceiver 126. 

The adaptive flow algorithm can be generalized to solve problems outside of the communications envir- 
onment It can handle parts inventory/shipping problems, military troop movement, mail delivery scheduling, 

so and many other real world mismatched flow problems. In each case, the user defines the given parameter x 
to yield an acceptable latency at the beginning, and then follows the algorithm to determine total flow time. 
The simple and standard algorithms each are O(n) overhead algorithms, whereas the adaptive flow algorithm 
is 0(log(n)). Therefore, as n grows, the adaptive flow algorithm overhead time will grow as log(n) and the oth- 
ers will grown as n. For large n, the first two algorithms require considerable processing and overhead com- 

55 pared to the adaptive flow algorithm. 
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Claims 

1. A method of operating an intermediate node (B) to receive an entity from a first node (A) and to transfer 
the entity to a second node (C), the first and second nodes being connected to the intermediate node 

5 by transmission links, the method comprising the steps of: 

(a) receiving at the intermediate node (B) a first block of the entity sent by the first node (A); 

(b) upon receipt of the block, initiating the sending of a subsequent block of the entity to the intermedi- 
ate node; 

(c) concurrently with step (b), transferring the first block of the entity to the second node (C); 

10 (d) upon receipt of an acknowledgement from the second node (C), causing the intermediate node to 

transfer a portion of the entity to the second node (C), the portion transferred being ail of the entity 
that has at the time of the transferral been received by the intermediate node (B) from the first node 
(A) since the previous transfer was made; and 
(e) repeating steps (b) and (d) until all of the entity has been transferred. 

15 

2. A method as claimed in Claim 1, wherein the entity is information. 

3. A method as claimed in Claim 2 wherein said information is of total length Y and comprises a plurality of 
blocks having a block length V. 

20 4. A method as claimed in Claim 3 wherein the block length V comprises a plurality of data bytes, and a 
count of the data bytes received at the intermediate node (B) is maintained in order to determine the length 
of the portion to be transferred at step (d). 

5. A method as claimed in any preceding claim wherein the transmission link between the first (A) and in- 
25 termediate (B) nodes operates at a different data rate to the transmission link between the intermediate 

(B) and second (C) nodes. 

6. A method as claimed in Claim 5, wherein the portion transferred at step (d) has a length different to the 
block length of the blocks sent by the first node (A). 

30 

7. A method as claimed in any preceding claims wherein the initiating step (b) is carried out by sending an 
acknowledgement of receipt of each block to the f irst node (A). 

8. A queuing system in an intermediate node (B) for receiving an entity from a first node (A) and transferring 
35 the entity to a second node (C), the first and second nodes being connected to the intermediate node 

by transmission links, the system comprising: 

reception means in the intermediate node (B) for receiving a first block of the entity sent by the first node 
(A); 

initiation means, responsive to the reception means indicating receipt of the block, for initiating the send- 
40 ing of a subsequent block of the entity to the intermediate node; 

a transfer means, operating concurrently with the initiation means, to transfer the first block of the entity 
to the second node (C); 

the transfer means further, upon receipt by the intermediate node (B) of an acknowledgement from the 
second node (C), transferring a portion of the entity to the second node (C), the portion transferred being 
45 all of the entity that has at the time of the transferral been received by the intermediate node (B) from 

the first node (A) since the previous transfer was made; 

the initiation means and transfer means repeating their functions until all of the entity has been trans- 
ferred. 

^ 9. A system as claimed in Claim 8, wherein the entity is information. 

10. A system as claimed in Claim 9 wherein said information is of total length Y and comprises a plurality of 
blocks having a block length V. 

11. A system as claimed in Claim 10 wherein the block length "x" comprises a plurality of data bytes, and a 
55 count of the data bytes received at the intermediate node (B) is maintained in order to determine the length 

of the portion to be transferred by the transfer means. 
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12. A system as claimed in any of claims 8 to 11 , wherein the transmission link between the first (A) and in- 
termediate (B) nodes operates at a different data rate to the transmission link between the intermediate 
(B) and second (C) nodes, and the portion transferred by the transfer means has a length different to 
the block length of the blocks sent by the first node (A). 

5 

13. A system as claimed in any of claims 8 to 12, wherein the initiation means initiates the sending of the 
subsequent block by sending an acknowledgement of receipt of each block to the first node (A). 
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